• Evaluation of the power efficiency of UPC, OpenMP and MPI 

      Lagraviere, Jeremie; Ha, Hoai Phuong; Cai, Xing (Research report; Forskningsrapport, 2015)
      In this study we compare the performance and power efficiency of Unified Parallel C (UPC), MPI and OpenMP by running a set of kernels from the NAS Benchmark. One of the goals of this study is to focus on the Partitioned Global Address Space (PGAS) model, in order to describe it and compare it to MPI and OpenMP. In particular we consider the power effi- ciency expressed in millions operations ...
    • Implementing and optimizing a Sparse Matrix-Vector Multiplication with UPC 

      Lagraviere, Jeremie Alexandre Emilien; Prugger, Martina; Einkemmer, Lukas; Langguth, Johannes; Ha, Hoai Phuong; Cai, Xing (Research report; Forskningsrapport, 2016)
      Programmability and performance-per-watt are the major challenges of the race to Exascale. In this study we focus on Partitioned Global Address Space (PGAS) languages, using UPC as a particular example. This category of parallel languages provides ease of programming as a strong advantage over the classic Message Passing Interface(MPI). PGAS has also advantages compared to classic shared memory ...
    • On the performance and energy efficiency of the PGAS programming model on multicore architectures 

      Lagraviere, Jeremie Alexandre Emilien; Langguth, Johannes; Sourouri, Mohammed; Ha, Hoai Phuong; Cai, Xing (Journal article; Tidsskriftartikkel; Peer reviewed, 2016-09-15)
    • Performance optimization and modeling of fine-grained irregular communication in UPC 

      Lagraviere, Jeremie Alexandre Emilien; Langguth, Johannes; Prugger, Martina; Einkemmer, Lukas; Ha, Hoai Phuong; Cai, Xing (Journal article; Tidsskriftartikkel; Peer reviewed, 2019-03-03)
      The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory subsystems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, ...